Overview

Dataset statistics

Number of variables11
Number of observations19020
Missing cells0
Missing cells (%)0.0%
Duplicate rows115
Duplicate rows (%)0.6%
Total size in memory1.7 MiB
Average record size in memory96.0 B

Variable types

Numeric10
Categorical1

Alerts

Dataset has 115 (0.6%) duplicate rowsDuplicates
fLength is highly correlated with fWidth and 3 other fieldsHigh correlation
fWidth is highly correlated with fLength and 3 other fieldsHigh correlation
fSize is highly correlated with fLength and 3 other fieldsHigh correlation
fConc is highly correlated with fLength and 3 other fieldsHigh correlation
fConc1 is highly correlated with fLength and 3 other fieldsHigh correlation
fLength is highly correlated with fWidth and 3 other fieldsHigh correlation
fWidth is highly correlated with fLength and 3 other fieldsHigh correlation
fSize is highly correlated with fLength and 3 other fieldsHigh correlation
fConc is highly correlated with fLength and 3 other fieldsHigh correlation
fConc1 is highly correlated with fLength and 3 other fieldsHigh correlation
fLength is highly correlated with fWidth and 3 other fieldsHigh correlation
fWidth is highly correlated with fLength and 3 other fieldsHigh correlation
fSize is highly correlated with fLength and 3 other fieldsHigh correlation
fConc is highly correlated with fLength and 3 other fieldsHigh correlation
fConc1 is highly correlated with fLength and 3 other fieldsHigh correlation
fLength is highly correlated with fWidth and 7 other fieldsHigh correlation
fWidth is highly correlated with fLength and 6 other fieldsHigh correlation
fSize is highly correlated with fLength and 4 other fieldsHigh correlation
fConc is highly correlated with fLength and 4 other fieldsHigh correlation
fConc1 is highly correlated with fLength and 4 other fieldsHigh correlation
fAsym is highly correlated with fLength and 2 other fieldsHigh correlation
fM3Long is highly correlated with fLength and 5 other fieldsHigh correlation
fM3Trans is highly correlated with fLength and 1 other fieldsHigh correlation
fAlpha is highly correlated with classHigh correlation
fDist is highly correlated with fLengthHigh correlation
class is highly correlated with fAlphaHigh correlation

Reproduction

Analysis started2022-04-15 18:03:28.128764
Analysis finished2022-04-15 18:03:55.001714
Duration26.87 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

fLength
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct18643
Distinct (%)98.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.25015393
Minimum4.2835
Maximum334.177
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size297.2 KiB

Quantile statistics

Minimum4.2835
5-th percentile16.433655
Q124.336
median37.1477
Q370.122175
95-th percentile139.72515
Maximum334.177
Range329.8935
Interquartile range (IQR)45.786175

Descriptive statistics

Standard deviation42.36485494
Coefficient of variation (CV)0.7955818306
Kurtosis4.970441241
Mean53.25015393
Median Absolute Deviation (MAD)16.32565
Skewness2.013652324
Sum1012817.928
Variance1794.780934
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20.75223
 
< 0.1%
24.83323
 
< 0.1%
26.91873
 
< 0.1%
19.15723
 
< 0.1%
12.91763
 
< 0.1%
98.29682
 
< 0.1%
32.29992
 
< 0.1%
84.57142
 
< 0.1%
24.89522
 
< 0.1%
12.47632
 
< 0.1%
Other values (18633)18995
99.9%
ValueCountFrequency (%)
4.28351
< 0.1%
7.20791
< 0.1%
7.36061
< 0.1%
8.05181
< 0.1%
8.23041
< 0.1%
8.23111
< 0.1%
8.48021
< 0.1%
8.57381
< 0.1%
8.6011
< 0.1%
8.69981
< 0.1%
ValueCountFrequency (%)
334.1771
< 0.1%
310.611
< 0.1%
305.4221
< 0.1%
305.3241
< 0.1%
305.09611
< 0.1%
303.56761
< 0.1%
303.27871
< 0.1%
299.93041
< 0.1%
297.12391
< 0.1%
295.6721
< 0.1%

fWidth
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct18200
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.18096622
Minimum0
Maximum256.382
Zeros98
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size297.2 KiB

Quantile statistics

Minimum0
5-th percentile7.4005
Q111.8638
median17.1399
Q324.739475
95-th percentile58.479245
Maximum256.382
Range256.382
Interquartile range (IQR)12.875675

Descriptive statistics

Standard deviation18.3460563
Coefficient of variation (CV)0.8271080761
Kurtosis16.76540668
Mean22.18096622
Median Absolute Deviation (MAD)5.87145
Skewness3.371627981
Sum421881.9775
Variance336.5777816
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
098
 
0.5%
10.75394
 
< 0.1%
0.00013
 
< 0.1%
10.03423
 
< 0.1%
15.86443
 
< 0.1%
0.00293
 
< 0.1%
9.55133
 
< 0.1%
0.00333
 
< 0.1%
20.20213
 
< 0.1%
12.81553
 
< 0.1%
Other values (18190)18894
99.3%
ValueCountFrequency (%)
098
0.5%
0.00013
 
< 0.1%
0.00021
 
< 0.1%
0.00061
 
< 0.1%
0.00191
 
< 0.1%
0.00252
 
< 0.1%
0.00262
 
< 0.1%
0.00271
 
< 0.1%
0.00283
 
< 0.1%
0.00293
 
< 0.1%
ValueCountFrequency (%)
256.3821
< 0.1%
228.03851
< 0.1%
220.51441
< 0.1%
201.3641
< 0.1%
190.54321
< 0.1%
190.1391
< 0.1%
188.88661
< 0.1%
186.9281
< 0.1%
179.29241
< 0.1%
177.7821
< 0.1%

fSize
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7228
Distinct (%)38.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.825016961
Minimum1.9413
Maximum5.3233
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size297.2 KiB

Quantile statistics

Minimum1.9413
5-th percentile2.1945
Q12.4771
median2.7396
Q33.1016
95-th percentile3.71575
Maximum5.3233
Range3.382
Interquartile range (IQR)0.6245

Descriptive statistics

Standard deviation0.4725986487
Coefficient of variation (CV)0.1672905527
Kurtosis0.7272784359
Mean2.825016961
Median Absolute Deviation (MAD)0.29895
Skewness0.8755071709
Sum53731.8226
Variance0.2233494827
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.150827
 
0.1%
2.128724
 
0.1%
2.077424
 
0.1%
2.131923
 
0.1%
2.141422
 
0.1%
2.313922
 
0.1%
2.135122
 
0.1%
2.393621
 
0.1%
2.2921
 
0.1%
2.358920
 
0.1%
Other values (7218)18794
98.8%
ValueCountFrequency (%)
1.94131
 
< 0.1%
1.94681
 
< 0.1%
1.99161
 
< 0.1%
1.99781
 
< 0.1%
2.00221
 
< 0.1%
2.00652
 
< 0.1%
2.01073
 
< 0.1%
2.01494
< 0.1%
2.01911
 
< 0.1%
2.02338
< 0.1%
ValueCountFrequency (%)
5.32331
< 0.1%
5.17951
< 0.1%
5.14671
< 0.1%
5.01181
< 0.1%
5.011
< 0.1%
4.99461
< 0.1%
4.95181
< 0.1%
4.93691
< 0.1%
4.9051
< 0.1%
4.85011
< 0.1%

fConc
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct6410
Distinct (%)33.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3803270715
Minimum0.0131
Maximum0.893
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size297.2 KiB

Quantile statistics

Minimum0.0131
5-th percentile0.1263
Q10.2358
median0.35415
Q30.5037
95-th percentile0.734205
Maximum0.893
Range0.8799
Interquartile range (IQR)0.2679

Descriptive statistics

Standard deviation0.1828131472
Coefficient of variation (CV)0.4806735069
Kurtosis-0.5212970988
Mean0.3803270715
Median Absolute Deviation (MAD)0.13025
Skewness0.4858884539
Sum7233.8209
Variance0.0334206468
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.616
 
0.1%
0.412
 
0.1%
0.411612
 
0.1%
0.297912
 
0.1%
0.217511
 
0.1%
0.221411
 
0.1%
0.511
 
0.1%
0.615411
 
0.1%
0.19311
 
0.1%
0.240811
 
0.1%
Other values (6400)18902
99.4%
ValueCountFrequency (%)
0.01311
< 0.1%
0.01331
< 0.1%
0.01371
< 0.1%
0.01392
< 0.1%
0.01581
< 0.1%
0.01621
< 0.1%
0.01711
< 0.1%
0.01881
< 0.1%
0.01961
< 0.1%
0.02061
< 0.1%
ValueCountFrequency (%)
0.8931
< 0.1%
0.89121
< 0.1%
0.88891
< 0.1%
0.88461
< 0.1%
0.87861
< 0.1%
0.87781
< 0.1%
0.87721
< 0.1%
0.87571
< 0.1%
0.87451
< 0.1%
0.87431
< 0.1%

fConc1
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4421
Distinct (%)23.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2146571346
Minimum0.0003
Maximum0.6752
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size297.2 KiB

Quantile statistics

Minimum0.0003
5-th percentile0.066995
Q10.128475
median0.1965
Q30.285225
95-th percentile0.42241
Maximum0.6752
Range0.6749
Interquartile range (IQR)0.15675

Descriptive statistics

Standard deviation0.1105107989
Coefficient of variation (CV)0.5148247185
Kurtosis0.0293910244
Mean0.2146571346
Median Absolute Deviation (MAD)0.0754
Skewness0.6856946259
Sum4082.7787
Variance0.01221263667
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.19418
 
0.1%
0.212616
 
0.1%
0.193916
 
0.1%
0.216
 
0.1%
0.21715
 
0.1%
0.225115
 
0.1%
0.151514
 
0.1%
0.150414
 
0.1%
0.127914
 
0.1%
0.156814
 
0.1%
Other values (4411)18868
99.2%
ValueCountFrequency (%)
0.00031
< 0.1%
0.00081
< 0.1%
0.00111
< 0.1%
0.00151
< 0.1%
0.0021
< 0.1%
0.00471
< 0.1%
0.0051
< 0.1%
0.00721
< 0.1%
0.00731
< 0.1%
0.00761
< 0.1%
ValueCountFrequency (%)
0.67521
< 0.1%
0.6741
< 0.1%
0.6431
< 0.1%
0.6371
< 0.1%
0.62961
< 0.1%
0.62831
< 0.1%
0.62641
< 0.1%
0.62421
< 0.1%
0.62241
< 0.1%
0.62041
< 0.1%

fAsym
Real number (ℝ)

HIGH CORRELATION

Distinct18704
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-4.331745158
Minimum-457.9161
Maximum575.2407
Zeros41
Zeros (%)0.2%
Negative8448
Negative (%)44.4%
Memory size297.2 KiB

Quantile statistics

Minimum-457.9161
5-th percentile-111.1947
Q1-20.58655
median4.01305
Q324.0637
95-th percentile65.544125
Maximum575.2407
Range1033.1568
Interquartile range (IQR)44.65025

Descriptive statistics

Standard deviation59.20606198
Coefficient of variation (CV)-13.66794671
Kurtosis8.155329763
Mean-4.331745158
Median Absolute Deviation (MAD)21.68065
Skewness-1.046441472
Sum-82389.7929
Variance3505.357776
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
041
 
0.2%
-0.00017
 
< 0.1%
8.80773
 
< 0.1%
7.10883
 
< 0.1%
-1.47613
 
< 0.1%
-0.50623
 
< 0.1%
152
 
< 0.1%
36.66312
 
< 0.1%
-2.06512
 
< 0.1%
58.61842
 
< 0.1%
Other values (18694)18952
99.6%
ValueCountFrequency (%)
-457.91611
< 0.1%
-449.95261
< 0.1%
-382.5941
< 0.1%
-381.7341
< 0.1%
-378.94571
< 0.1%
-368.6331
< 0.1%
-363.33821
< 0.1%
-353.9341
< 0.1%
-353.261
< 0.1%
-349.7571
< 0.1%
ValueCountFrequency (%)
575.24071
< 0.1%
473.06541
< 0.1%
464.6311
< 0.1%
444.4011
< 0.1%
433.09571
< 0.1%
402.9251
< 0.1%
402.18631
< 0.1%
400.2841
< 0.1%
396.33791
< 0.1%
384.34771
< 0.1%

fM3Long
Real number (ℝ)

HIGH CORRELATION

Distinct18693
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.54554482
Minimum-331.78
Maximum238.321
Zeros39
Zeros (%)0.2%
Negative6604
Negative (%)34.7%
Memory size297.2 KiB

Quantile statistics

Minimum-331.78
5-th percentile-80.28369
Q1-12.842775
median15.3141
Q335.8378
95-th percentile83.07177
Maximum238.321
Range570.101
Interquartile range (IQR)48.680575

Descriptive statistics

Standard deviation51.00011801
Coefficient of variation (CV)4.836176689
Kurtosis4.670973798
Mean10.54554482
Median Absolute Deviation (MAD)25.33365
Skewness-1.123078055
Sum200576.2624
Variance2601.012037
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
039
 
0.2%
-0.00014
 
< 0.1%
16.07473
 
< 0.1%
-10.73012
 
< 0.1%
20.17232
 
< 0.1%
-18.54092
 
< 0.1%
54.472
 
< 0.1%
22.6492
 
< 0.1%
-18.25352
 
< 0.1%
14.96562
 
< 0.1%
Other values (18683)18960
99.7%
ValueCountFrequency (%)
-331.781
< 0.1%
-318.30021
< 0.1%
-297.17171
< 0.1%
-293.17621
< 0.1%
-287.50671
< 0.1%
-287.36361
< 0.1%
-284.70381
< 0.1%
-281.95411
< 0.1%
-281.8441
< 0.1%
-281.4351
< 0.1%
ValueCountFrequency (%)
238.3211
< 0.1%
231.4461
< 0.1%
227.81741
< 0.1%
226.35061
< 0.1%
222.4171
< 0.1%
217.9341
< 0.1%
217.6241
< 0.1%
216.9851
< 0.1%
215.8941
< 0.1%
203.8631
< 0.1%

fM3Trans
Real number (ℝ)

HIGH CORRELATION

Distinct18390
Distinct (%)96.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2497259569
Minimum-205.8947
Maximum179.851
Zeros59
Zeros (%)0.3%
Negative9404
Negative (%)49.4%
Memory size297.2 KiB

Quantile statistics

Minimum-205.8947
5-th percentile-25.76384
Q1-10.849375
median0.6662
Q310.946425
95-th percentile26.99851
Maximum179.851
Range385.7457
Interquartile range (IQR)21.7958

Descriptive statistics

Standard deviation20.82743895
Coefficient of variation (CV)83.40117786
Kurtosis8.580352473
Mean0.2497259569
Median Absolute Deviation (MAD)10.888
Skewness0.1201212735
Sum4749.7877
Variance433.7822131
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
059
 
0.3%
-0.000124
 
0.1%
0.000118
 
0.1%
-5.44543
 
< 0.1%
-7.66013
 
< 0.1%
11.16023
 
< 0.1%
6.18293
 
< 0.1%
-8.9753
 
< 0.1%
10.90153
 
< 0.1%
9.52313
 
< 0.1%
Other values (18380)18898
99.4%
ValueCountFrequency (%)
-205.89471
< 0.1%
-164.141
< 0.1%
-149.55131
< 0.1%
-142.58941
< 0.1%
-142.1191
< 0.1%
-135.50511
< 0.1%
-134.751
< 0.1%
-134.3951
< 0.1%
-133.13591
< 0.1%
-132.4161
< 0.1%
ValueCountFrequency (%)
179.8511
< 0.1%
170.6921
< 0.1%
163.26971
< 0.1%
154.8651
< 0.1%
143.87531
< 0.1%
139.23611
< 0.1%
132.5891
< 0.1%
132.3881
< 0.1%
131.55471
< 0.1%
130.85451
< 0.1%

fAlpha
Real number (ℝ≥0)

HIGH CORRELATION

Distinct17981
Distinct (%)94.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.64570668
Minimum0
Maximum90
Zeros5
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size297.2 KiB

Quantile statistics

Minimum0
5-th percentile0.933285
Q15.547925
median17.6795
Q345.88355
95-th percentile80.72654
Maximum90
Range90
Interquartile range (IQR)40.335625

Descriptive statistics

Standard deviation26.10362051
Coefficient of variation (CV)0.9442196872
Kurtosis-0.5337036036
Mean27.64570668
Median Absolute Deviation (MAD)14.6924
Skewness0.8508898774
Sum525821.341
Variance681.3990037
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.00027
 
< 0.1%
05
 
< 0.1%
0.3864
 
< 0.1%
1.294
 
< 0.1%
904
 
< 0.1%
0.8044
 
< 0.1%
0.2564
 
< 0.1%
3.41614
 
< 0.1%
2.764
 
< 0.1%
2.7014
 
< 0.1%
Other values (17971)18976
99.8%
ValueCountFrequency (%)
05
< 0.1%
0.00027
< 0.1%
0.00032
 
< 0.1%
0.0011
 
< 0.1%
0.00311
 
< 0.1%
0.00561
 
< 0.1%
0.00861
 
< 0.1%
0.0091
 
< 0.1%
0.00971
 
< 0.1%
0.01031
 
< 0.1%
ValueCountFrequency (%)
904
< 0.1%
89.97981
 
< 0.1%
89.95791
 
< 0.1%
89.95351
 
< 0.1%
89.95281
 
< 0.1%
89.92291
 
< 0.1%
89.91551
 
< 0.1%
89.90871
 
< 0.1%
89.90761
 
< 0.1%
89.90421
 
< 0.1%

fDist
Real number (ℝ≥0)

HIGH CORRELATION

Distinct18437
Distinct (%)96.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean193.8180265
Minimum1.2826
Maximum495.561
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size297.2 KiB

Quantile statistics

Minimum1.2826
5-th percentile71.41369
Q1142.49225
median191.85145
Q3240.563825
95-th percentile326.659975
Maximum495.561
Range494.2784
Interquartile range (IQR)98.071575

Descriptive statistics

Standard deviation74.73178696
Coefficient of variation (CV)0.3855770711
Kurtosis-0.112576594
Mean193.8180265
Median Absolute Deviation (MAD)49.0165
Skewness0.2295873764
Sum3686418.863
Variance5584.839983
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
182.0133
 
< 0.1%
227.1073
 
< 0.1%
265.2383
 
< 0.1%
146.3543
 
< 0.1%
186.8283
 
< 0.1%
168.7743
 
< 0.1%
216.0323
 
< 0.1%
187.6513
 
< 0.1%
148.3723
 
< 0.1%
100.3953
 
< 0.1%
Other values (18427)18990
99.8%
ValueCountFrequency (%)
1.28261
< 0.1%
5.54491
< 0.1%
5.59221
< 0.1%
5.69981
< 0.1%
5.74561
< 0.1%
6.5641
< 0.1%
6.68521
< 0.1%
9.15741
< 0.1%
13.11081
< 0.1%
14.02291
< 0.1%
ValueCountFrequency (%)
495.5611
< 0.1%
466.40781
< 0.1%
450.9531
< 0.1%
450.4021
< 0.1%
450.3491
< 0.1%
448.02951
< 0.1%
446.4881
< 0.1%
438.9011
< 0.1%
438.85741
< 0.1%
437.4771
< 0.1%

class
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size297.2 KiB
g
12332 
h
6688 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters19020
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowg
2nd rowg
3rd rowg
4th rowg
5th rowg

Common Values

ValueCountFrequency (%)
g12332
64.8%
h6688
35.2%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
g12332
64.8%
h6688
35.2%

Most occurring characters

ValueCountFrequency (%)
g12332
64.8%
h6688
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter19020
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
g12332
64.8%
h6688
35.2%

Most occurring scripts

ValueCountFrequency (%)
Latin19020
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
g12332
64.8%
h6688
35.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII19020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
g12332
64.8%
h6688
35.2%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

fLengthfWidthfSizefConcfConc1fAsymfM3LongfM3TransfAlphafDistclass
028.796716.00212.64490.39180.198227.700422.0110-8.202740.092081.8828g
131.603611.72352.51850.53030.377326.272223.8238-9.95746.3609205.2610g
2162.0520136.03104.06120.03740.0187116.7410-64.8580-45.216076.9600256.7880g
323.81729.57282.33850.61470.392227.2107-6.4633-7.151310.4490116.7370g
475.136230.92053.16110.31680.1832-5.527728.552521.83934.6480356.4620g
551.624021.15022.90850.24200.134050.876143.18879.81453.6130238.0980g
648.246817.35653.03320.25290.15158.573038.095710.58684.7920219.0870g
726.789713.75952.55210.42360.217429.633920.4560-2.92920.8120237.1340g
896.232746.51654.15400.07790.0390110.355085.048643.18444.8540248.2260g
946.761915.19932.57860.33770.191324.754843.8771-6.68127.8750102.2510g

Last rows

fLengthfWidthfSizefConcfConc1fAsymfM3LongfM3TransfAlphafDistclass
1901032.490210.67232.47420.46640.2735-27.0097-21.16878.481369.1730120.6680h
1901179.552844.99293.54880.16560.0900-39.621353.7866-30.005415.8075311.5680h
1901231.837313.87342.82510.41690.1988-16.4919-27.144811.109811.3663100.0566h
19013182.500376.55683.68720.11230.0666192.267593.0302-62.619282.1691283.4731h
1901443.298017.35452.83070.28770.1646-60.1842-33.8513-3.654578.4099224.8299h
1901521.384610.91702.61610.58570.393415.261811.52452.87662.4229106.8258h
1901628.94526.70202.26720.53510.278437.081613.1853-2.963286.7975247.4560h
1901775.445547.53053.44830.14170.0549-9.356141.0562-9.466230.2987256.5166h
19018120.513576.90183.99390.09440.06835.8043-93.5224-63.838984.6874408.3166h
19019187.181453.00143.20930.28760.1539-167.3125-168.455831.475552.7310272.3174h

Duplicate rows

Most frequently occurring

fLengthfWidthfSizefConcfConc1fAsymfM3LongfM3TransfAlphafDistclass# duplicates
012.917611.35962.11230.74130.390015.0388-5.6768-11.563864.9330227.1070h2
112.980110.88152.41750.74570.4723-13.69706.0371-7.001930.803078.2618h2
213.028710.95442.20000.75710.4511-14.09855.7807-10.174864.8700182.9800h2
314.791211.79552.30750.67490.45571.35334.7675-9.061162.250062.5245h2
416.756611.30632.37660.58400.35500.00000.15436.741948.5040117.6360h2
516.989411.00022.45640.62940.3514-3.49028.0823-7.051655.393091.3761h2
618.434317.87172.38470.48660.2701-15.7044-16.5170-12.231171.0730158.7030h2
718.49149.76352.48290.65790.3734-1.80607.65206.726033.8161188.8670h2
818.809011.13052.54960.60370.42450.5645-3.060811.817675.5740222.5910h2
919.084813.73462.58490.60080.396617.282419.4696-5.23905.3161213.7140h2